4 research outputs found
Learning Neuro-symbolic Programs for Language Guided Robot Manipulation
Given a natural language instruction and an input scene, our goal is to train
a model to output a manipulation program that can be executed by the robot.
Prior approaches for this task possess one of the following limitations: (i)
rely on hand-coded symbols for concepts limiting generalization beyond those
seen during training [1] (ii) infer action sequences from instructions but
require dense sub-goal supervision [2] or (iii) lack semantics required for
deeper object-centric reasoning inherent in interpreting complex instructions
[3]. In contrast, our approach can handle linguistic as well as perceptual
variations, end-to-end trainable and requires no intermediate supervision. The
proposed model uses symbolic reasoning constructs that operate on a latent
neural object-centric representation, allowing for deeper reasoning over the
input scene. Central to our approach is a modular structure consisting of a
hierarchical instruction parser and an action simulator to learn disentangled
action representations. Our experiments on a simulated environment with a 7-DOF
manipulator, consisting of instructions with varying number of steps and scenes
with different number of objects, demonstrate that our model is robust to such
variations and significantly outperforms baselines, particularly in the
generalization settings. The code, dataset and experiment videos are available
at https://nsrmp.github.ioComment: International Conference on Robotics and Automation (ICRA), 202
Recommended from our members
Tracking what matters: A decision-variable account of human behavior in bandit tasks
We study human learning & decision-making in tasks with probabilistic rewards. Recent studies in a 2-armed bandit task find that a modification of classical Q-learning algorithms, with outcome-dependent learning rates, better explains behavior compared to constant learning rates. We propose a simple alternative: humans directly track the decision variable underlying choice in the task. Under this policy learning perspective, asymmetric learning can be reinterpreted as an increasing confidence in the preferred choice. We provide specific update rules for incorporating partial feedback (outcomes on chosen arms) and complete feedback (outcome on chosen & unchosen arms), and show that our model consistently outperforms previously proposed models on a range of datasets. Our model and update rules also add nuance to previous findings of perseverative behavior in bandit tasks; we show evidence of outcome-dependent choice perseveration, i.e., that humans persevere in their choices unless contradictory evidence is presented